Breaking the Black Box
The Power of Chain-of-Thought (CoT) Prompting
Land Acknowledgement
I would like to begin by acknowledging that we are on the traditional, ancestral, and unceded territory of the xʷməθkʷəy̓əm (Musqueam), Sḵwx̱wú7mesh (Squamish), and Tsleil-Waututh peoples. I am thankful to have the opportunity to live and learn on this land.
The “Black Box” Problem
Why standard LLMs fail complex tasks
- Early users treated LLMs like search engine.
- Models are great at facts, but struggle with multi-step reasoning.
- Models return a wrong answer without reasoning.
What is Chain-of-Thought (CoT)?
- A technique where we ask an AI model to explain its reasoning before giving the answer.
- Instead of jumping to conclusions, the model unfolds its “thought process”.
Example
Question: Roger has 5 balls. He buys 2 cans (3 balls each). Total?
| Output: “7” |
Step 1: 5 balls. |
| Result: Incorrect |
Step 2: (2 cans × 3) = 6 balls more. |
|
Step 3: 5 + 6 = 11. |
|
Answer: 11 (Correct) |
Why? The model uses its own output as “new context” for the next step.
The LEGO Principle
1. Logic Decomposition
- LEGO castle
- Massive problems into smaller and manageable “blocks”.
- Build the baseplate \(\rightarrow\) Walls \(\rightarrow\) Towers.
2. External Memory
- Write down a reasoning step, like placing a LEGO brick firmly into the baseplate.
- Once that brick is placed, the model can “see” it.
3. Error Catching
- Put a 2x4 brick where a 2x2 brick should be, you will notice immediately.
Variations of the CoT Pattern
1. Zero-Shot CoT
The “Magic” Phrase: “Let’s think step by step.” No examples required.
![]()
Image Source: Kojima et al. (2022)
Variations of the CoT Pattern
2. Few-Shot CoT
Providing Exemplars. Showing the model 2-3 solved problems with worked-out logic.
![]()
Image Source: Kojima et al. (2022)
Variations of the CoT Pattern
3. Self-Consistency
The “Majority Vote.” Generate 5 different paths; if 4 lead to “11” and 1 leads to “7”, choose 11.
![]()
Image Source: Wang et al., 2022
Advantages & Trade-offs
The Benefits
- Transparency: Essential for Medical Diagnosis and Financial Decisions.
- Precision: Stays “on track” during long generations.
![]()
The Costs
- Resource Overhead: Higher token count = Higher cost.
- Latency: “Thinking” takes time. Not ideal for real-time.
![]()
Thank you!
References:
- Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916
- Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171
- Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E. H., Le, Q. V., & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv preprint arXiv:2201.11903.
- Chain-of-Thought (CoT) Prompting. (n.d.). Prompting Guide. Retrieved January 7, 2026, from https://www.promptingguide.ai/techniques/cot/